Abstract
Forecasting the inmediate future of financial data is a necessary step in order to produce profitabilty in investing or hedging strategies, and being so important in these environments, multiple models and theories have been produced in order to explain the behaviour and future values of the financial assets prices. This work will expand on two of these models, and compare them with real-world data to validate or dismiss the results of these models through experiments and graphs.
In economics, the asset pricing refers to a formal treatment and development of two main pricing principles: equilibrium asset pricing and rational asset pricing. The first one offers a pricing through market principles of supply and demand. These principles and specific models are imporntant for understading the pricing mechanisms in seemingly complex financial markets. Asset pricing models are also important tools for individuales and corporations in analysing and solving a number of financial concerns and decisions. (Munk, 2013)
This work expands on the real-world data comparison of the proposed models and its theoretical results.
The first model to be explored is the Asset Pricing Theory Model (APT) as proposed in the subject that originated this work (Muñoz-Elguezabal, 2017). This model proposes that the price of an asset can be modeled through a martingale stochastic process in discrete-time. Thus, the model proposes the following equation to estimate the future price of the asset: $$P_t = E[P_{t+1}] $$ This model basically states that the best estimator for the future price of the asset is the actual price of the asset.
The second model to be explored is the Roll Model, as seen in its original paper "A Simple Implicit Measure of the Effective Bid‐Ask Spread in an Efficient" by Richard Roll. This model states that the spread of the bid-ask prices can be efficently modeled as: $$spread = \sqrt{-cov}$$ where the $cov$ is given by the first-order serial autocovariance of the price changes of the orderbooks. (Roll, 1984)
This work consists in utilizing real-world BTC/USD 1-hour trading data, calculating the proposed values of the models and comparing them to the actual data, using python as the proposed programming language to accomplish this objective. Results will be presented with DataFrames and graphs, in order to obtain pertaining conclusions to the theoretical results of the models vs. their performances.
Being old and paper-presented models, these models are expected to perform within a range within the data, with expected errors, but also successes within the predicted data.
In order to run this notebook, it is necessary to have installed and/or have the requirements.txt file with the following:
The following are the file dependencies that are needed to run this notebook:
%%capture
#!{sys.executable} -m pip install -r requirements.txt
# Install all the pip packages in the requirements.txt
import numpy as np
import pandas as pd
import data as dt
import functions
import visualizations
import plotly.io as pio
pio.renderers.keys()
import plotly
plotly.offline.init_notebook_mode()
# DataFrame Head
# Obtaining JSON file from data library (Filename: files/orderbooks_05jul21)
data_ob = dt.ob_data
# Orderbook timestamps
ob_ts = list(data_ob.keys())
# Timestamp listings
l_ts = [pd.to_datetime(i_ts) for i_ts in ob_ts]
# Metrics from Functions library
ob_df,_,_ = functions.df_metrics(data_ob)
# midpricess from metrics dataframe
midprices = ob_df["Mid Price"]
pd.DataFrame(midprices.head())
| Mid Price | |
|---|---|
| 2021-07-05 13:06:46.571000+00:00 | 28272.5 |
| 2021-07-05 13:06:47.918000+00:00 | 28272.5 |
| 2021-07-05 13:06:49.414000+00:00 | 28272.5 |
| 2021-07-05 13:06:51.077000+00:00 | 28276.5 |
| 2021-07-05 13:06:52.426000+00:00 | 28276.5 |
# Midprices Histogram
pd.DataFrame(midprices).hist(bins="auto");
As stated in the introduction, this model consists on finding whether the asset price is a martingale process. In order to find if it is consistent with such model, we will provide the results on the martingale process evaluation of the afformentioned data and display it in a DataFrame.
The results will also be evaluated in two experiments:
# Martingale results for the entire data:
# Metrics from Functions library
ob_df,_,_ = functions.df_metrics(data_ob)
# midpricess from metrics dataframe
midprices = ob_df["Mid Price"]
# Asset pricing model: Best estimator for future prices is current price
# Is it valid most of the time?
M1_e1 = functions.Model1_E1(midprices)
M1_e1
| amount | ratio | |
|---|---|---|
| e1 | 1763.0 | 0.73 |
| e2 | 637.0 | 0.27 |
| total | 2400.0 | 2400.00 |
As stated, now the data will be segmented into the trades that ocurred in each particular minute along the entire span of the data.
# Second Experiment (Every Minute Data)
M1_E2_1,M1_E2_2 = functions.Model1_E2(midprices)
M1_E2_1.head()
| e1 | e2 | total | ratio1 | ratio2 | |
|---|---|---|---|---|---|
| 13:6 | 6 | 2 | 8 | 0.750000 | 0.250000 |
| 13:7 | 27 | 13 | 40 | 0.675000 | 0.325000 |
| 13:8 | 31 | 8 | 39 | 0.794872 | 0.205128 |
| 13:9 | 27 | 11 | 38 | 0.710526 | 0.289474 |
| 13:10 | 30 | 10 | 40 | 0.750000 | 0.250000 |
M1_E2_2
| Total trades | E1 Ratio Mean | E2 Ratio Mean | |
|---|---|---|---|
| 0 | 2400 | 0.743515 | 0.256485 |
For this experiment, now the martingale process will be evaluated with the weighted mid prices, both for the entire data and the minute segmented data.
# Experiment 3: Martingale Process with Weighted MidPrice
M1_E3_1,M1_E3_2,M1_E3_3 = functions.Model1_E3(ob_df)
M1_E3_1
| amount | ratio | |
|---|---|---|
| e1 | 1622.0 | 0.68 |
| e2 | 778.0 | 0.32 |
| total | 2400.0 | 2400.00 |
M1_E3_2.head()
| e1 | e2 | total | ratio1 | ratio2 | |
|---|---|---|---|---|---|
| 13:6 | 6 | 2 | 8 | 0.750000 | 0.250000 |
| 13:7 | 27 | 13 | 40 | 0.675000 | 0.325000 |
| 13:8 | 26 | 13 | 39 | 0.666667 | 0.333333 |
| 13:9 | 26 | 12 | 38 | 0.684211 | 0.315789 |
| 13:10 | 27 | 13 | 40 | 0.675000 | 0.325000 |
M1_E3_3
| Total trades | E1 Ratio Mean | E2 Ratio Mean | |
|---|---|---|---|
| 0 | 2400 | 0.686378 | 0.313622 |
As we've seen, this model states that the spread of the orderbook can be estimated using the 1st shit autocovariance of the price changes. For this model, the price considered is the Mid Price, and the results will be compared with the actual spread observed in the orderbook.
M2_1,M2_2 = functions.Model2(pd.DataFrame(midprices),ob_df)
M2_1.head()
| Spread (OB) | Calculated Spread | Bid | Mid | Ask | Calc Bid | Calc Ask | |
|---|---|---|---|---|---|---|---|
| 2021-07-05 13:06:46.571000+00:00 | 5.0 | 0.070044 | 28267.5 | 28272.5 | 28277.5 | 28272.429956 | 28272.570044 |
| 2021-07-05 13:06:47.918000+00:00 | 5.0 | 0.070044 | 28267.5 | 28272.5 | 28277.5 | 28272.429956 | 28272.570044 |
| 2021-07-05 13:06:49.414000+00:00 | 5.0 | 0.070044 | 28267.5 | 28272.5 | 28277.5 | 28272.429956 | 28272.570044 |
| 2021-07-05 13:06:51.077000+00:00 | 3.0 | 0.070044 | 28273.5 | 28276.5 | 28279.5 | 28276.429956 | 28276.570044 |
| 2021-07-05 13:06:52.426000+00:00 | 3.0 | 0.070044 | 28273.5 | 28276.5 | 28279.5 | 28276.429956 | 28276.570044 |
M2_2
| Spread Mean | Spread Variance | Calculated Spread | Spread Difference | |
|---|---|---|---|---|
| 0 | 3.946272 | 6.021695 | 0.070044 | 3.876229 |
Model 1 or APT, was a success in correctly estimating most of the time the future price of the trade. Although this model is not suited for more than 1 step in the future estimations, it might be useful for trading systems.
Model 2 or Roll Model was unsuccesful in correctly estimating the spread of the Orderbook. This might be due to the size of the price of the asset.
We might be able to further observe the results of the first model comparing the midprices with its prices shifted one step into the future, to observe how the tendency of the price is estimated using the APT model.
As we can observe, most of the time the price is the same, proving to us that APT is an adequate model.
As we can observe, in relation to the total amount of trades in the orderbooks data, e1 being succesful predictions and e2 being unsuccesful predictions, most of the time succesful predictions were performed.
Experiment 1:
M1_e1
| amount | ratio | |
|---|---|---|
| e1 | 1763.0 | 0.73 |
| e2 | 637.0 | 0.27 |
| total | 2400.0 | 2400.00 |
Evaluating the martingale prediction success rate for all data, we can observe that 73% of the time we can accurately predict what the next price will be.
Results of Experiment 2:
e2_graph = visualizations.APT_graph(M1_E2_1)
e2_graph.show(renderer="notebook")
As we can see in this graph, the succesful martingale prediction represents for all minutes the majority of the results, we can consider this model satisfactory to make one-step forecasts for the midprices reached.
Results of Experiment 3:
e3_graph = visualizations.APT_graph_w(M1_E3_2)
e3_graph.show(renderer="notebook")
This images present, in an order reminiscent of the previous graph, the success rates for the model 1 evaluation over all the data, this time over Volume-Weighted Mid Prices. As we can see, the success rates were also over 75%.
We can consider the predictions of the model 1 acceptable to produce inmediate forecasting of prices.
As stated, this models objective was to esitmate the efficent spread of the orderbook, based on the autocovariance of the price changes.
roll_r1 = visualizations.Model2_TS_observed(M2_1)
roll_r1.show(renderer = "notebook")
roll_r2 = visualizations.Model2_TS_Theoretical(M2_1)
roll_r2.show(renderer = "notebook")
Observing the comparison of the actual spread in the orderbooks vs.s the efficent spread, we can conclude that this model produced an unsuccesful forecast of the spread in the orderbook.
This graph represents the distribution across all spreads in the data. We can observe that most of the time, the spread was a difference of 2 on average. This represents a lot of difference compared to the efficent spread value: $0.0704$